TargetFinder and Annotator: a Simple Approach for Finding Full-length Target cDNAs and for Annotating EST Sequences
نویسندگان
چکیده
In a large scale EST (expressed sequence tag) or cDNA sequencing project, it is often desirable to know whether the ESTs identify genes of interest and whether the cloned cDNAs include intact coding regions (are of full-length). In this work, we present two Perl tools, TargetFinder and Annotator. TargetFinder automates the identification of full-length cDNAs from assembled EST sequences including singletons and contigs. Annotator is used to annotate ESTs and their assembled sequences by assigning a provisional function to each sequence and predicting whether or not they include intact coding regions. The programs use the output of BLASTX to predict the correct reading frame, search for a putative start codon, and predict whether a query sequence includes an intact ORF. In addition the programs also predict whether the sequence of the coding region of a cDNA clone or a contig is complete. Using our own Aspergillus niger EST data, TargetFinder rapidly and accurately found full-length target genes within a large set of assembled ESTs and Annotator functionally annotated the ESTs and their assembled sequences.
منابع مشابه
Full-malaria 2004: an enlarged database for comparative studies of full-length cDNAs of malaria parasites, Plasmodium species
Full-malaria (http://fullmal.ims.u-tokyo.ac.jp), a database for full-length cDNAs from the human malaria parasite, Plasmodium falciparum has been updated in at least three points. (i) We added 8934 sequences generated from the addition of new libraries, so that our collection of 11,424 full-length cDNAs covers 1375 (25%) of the estimated number of the entire 5409 parasite genes. (ii) All of our...
متن کاملTargetIdentifier: a webserver for identifying full-length cDNAs from EST sequences
TargetIdentifier is a webserver that identifies full-length cDNA sequences from the expressed sequence tag (EST)-derived contig and singleton data. To accomplish this TargetIdentifier uses BLASTX alignments as a guide to locate protein coding regions and potential start and stop codons. This information is then used to determine whether the EST-derived sequences include their translation start ...
متن کاملcDNA sequences for transcription factors and signaling proteins of the hemichordate Saccoglossus kowalevskii: efficacy of the expressed sequence tag (EST) approach for evolutionary and developmental studies of a new organism.
We describe a collection of expressed sequence tags (ESTs) for Saccoglossus kowalevskii, a direct-developing hemichordate valuable for evolutionary comparisons with chordates. The 202,175 ESTs represent 163,633 arrayed clones carrying cDNAs prepared from embryonic libraries, and they assemble into 13,677 continuous sequences (contigs), leaving 10,896 singletons (excluding mitochondrial sequence...
متن کاملشناسایی RNA های غیرکدکننده کوتاه عملکردی با استفاده از روش های بیوانفورماتیکی در گوسفند و بز
MicroRNAs (miRNAs) are small non-coding RNAs that have functional roles in post-transcriptional modification. They regulate gene expression by an RNA interfering pathway through cleavage or inhibition of the translation of target mRNA. Numerous miRNAs have been described for their important functions in developmental processes in numerous animals, but there is limited information about sheep an...
متن کاملSequencing and Analysis of Approximately 40 000 Soybean cDNA Clones from a Full-Length-Enriched cDNA Library
A large collection of full-length cDNAs is essential for the correct annotation of genomic sequences and for the functional analysis of genes and their products. We obtained a total of 39,936 soybean cDNA clones (GMFL01 and GMFL02 clone sets) in a full-length-enriched cDNA library which was constructed from soybean plants that were grown under various developmental and environmental conditions....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003